\documentclass{article}
\usepackage{url}
\usepackage{hyperref}
\hypersetup{
    colorlinks=true,
    linkcolor=blue,
    filecolor=magenta,      
    urlcolor=cyan,
}
\usepackage{graphicx}

\newcommand{\includeimage}[2][]{%
%HEVEA\imgsrc{#2.hevea.png}%
%BEGIN LATEX
\includegraphics[#1]{#2}
%END LATEX
}

% BEAST book specific commands
\newcommand{\BEASTVersion}{2.3.x}
\newcommand{\TracerVersion}{1.6}
\newcommand{\FigTreeVersion}{1.4.2}

\newcommand{\chainLength}{{5,000,000}}
\newcommand{\logEvery}{{5,000}}
\newcommand{\screenEvery}{{10,000}}


\begin{document}
\title{*BEAST in BEAST {\BEASTVersion}\\
Estimating Species Trees from Multilocus Data}

\author{Joseph Heled, Remco Bouckaert, Walter Xie and Alexei J Drummond }

\maketitle

\section{Introduction}

In this tutorial we describe a full Bayesian framework for species tree estimation. The statistical methodology described in this tutorial is known by the acronym *BEAST (pronounced "star beast") \cite{Heled:2010fk}.

You will need the following software at your disposal:

\begin{itemize}

\item {\bf BEAST} \cite{beast}- this package contains the BEAST program, BEAUti, TreeAnnotator and other utility programs. This tutorial is written for BEAST v{\BEASTVersion}, which has support for multiple partitions. It is available for download from \texttt{http://beast2.org}.
\item {\bf Tracer} - this program is used to explore the output of BEAST (and other Bayesian MCMC programs). It graphically and
quantitively summarizes the distributions of continuous parameters and provides diagnostic information. At the time of
writing, the current version is v{\TracerVersion}. It is available for download from \texttt{http://tree.bio.ed.ac.uk/software/figtree/}.
\item {\bf FigTree} - this is an application for displaying and printing molecular phylogenies, in particular those obtained using
BEAST. At the time of writing, the current version is v{\FigTreeVersion}. It is available for download from \texttt{http://tree.bio.ed.ac.uk/software/tracer/}.
\end{itemize}

\section{*BEAST}

This tutorial will guide you through the analysis of three loci sampled from 26 individuals representing nine species of pocket gophers. This is a subset of previous published data \cite{belfiore2008multilocus}. The objective of this tutorial is to estimate the species tree that is most probable given the multi-individual multi-locus sequence data. The species tree has nine taxa, whereas each gene tree has 26 taxa. *BEAST 2 will co-estimate three gene trees embedded in a shared species tree \cite[for details]{Heled:2010fk}.

The first step will be to convert a NEXUS file with a DATA or CHARACTERS block into a BEAST XML input file. This is done using the program BEAUti (Bayesian Evolutionary Analysis Utility). This is a user-friendly program for setting the evolutionary model and options for the MCMC analysis. The second step is to actually run BEAST using the input file that contains the data, model and settings. The final step is to explore the output of BEAST in order to diagnose problems and to summarize the results.

\subsection*{BEAUti}

Run BEAUti by double clicking on its icon. 

\subsubsection*{Set up BEAUti for *BEAST}

*BEAST uses a different template from the standard. This means that to use BEAUti for *BEAST, the first thing to do is change the template. Choose the File/Template/StarBeast item. When changing a template, BEAUti deletes all previously imported data and start with a new empty template. So, if you already loaded some data, a warning message pops up indicating that this data will be lost if you switch templates.

\begin{figure}
\centering
\includeimage[scale=0.4,clip=true,trim=0 300 200 0]{figures/BEAUti_selectTemplate}

\caption{\label{fig.template} Select a new template in BEAUti.}
\end{figure}


\def\mlstname{*BEAST}

\subsubsection*{Loading the NEXUS file }

%% JH
\mlstname{} is a multi-individual, multi-locus method method. The data for each
locus is stored as one alignment in its own NEXUS file. Taxa names in each
alignment have to be unique, but duplicates across alignments are fine.
%% JH

To load a NEXUS format alignment, simply select the \texttt{Import
Alignment} option from the File menu: 

Select three files called \href{https://github.com/CompEvol/beast2/blob/master/examples/nexus/26.nex?raw=true}{26.nex}, \href{https://github.com/CompEvol/beast2/blob/master/examples/nexus/29.nex?raw=true}{29.nex}, \href{https://github.com/CompEvol/beast2/blob/master/examples/nexus/47.nex?raw=true}{47.nex} by holding \texttt{shift} key. 
You can find the files in the {\tt examples/nexus} directory in the directory where BEAST was installed. 
Or click the links to download the data. After the data is opened in your web browser each time, right click mouse and separately save them as \texttt{26.nex, 29.nex, 47.nex}.

Each file contains an alignment of sequences of from an independent locus. The \texttt{26.nex} looks like this (content has been truncated):

\begin{verbatim}
#NEXUS
[TBO26oLong]
BEGIN DATA;
	DIMENSIONS  NTAX =26 NCHAR=614;
	FORMAT DATATYPE = DNA GAP = - MISSING = ?;
	MATRIX	
	Orthogeomys_heterodus        ATTCTAGGCAAAAAGAGCAATGC ...
	Thomomys_bottae_awahnee_a    ????????????????????ATGCTG ...
	Thomomys_bottae_awahnee_b    ????????????????????ATGCTG ...
	Thomomys_bottae_xerophilus   ????????????????????ATGCTG ...
	Thomomys_bottae_cactophilus  ????????????????AGCAATGCT ...

         ... ...

;
END;
\end{verbatim}

\medskip{}

Once loaded, the three partitions are displayed in the main panel.
You can double click any alignment (partition) to show its detail.

\begin{figure}
\centering
\includeimage[scale=0.32,clip=true,trim=0 300 0 0]{figures/BEAUti_DataPartitions}

\caption{\label{fig.datapartition} Data partition panel after loading alignments.}
\end{figure}


For multi-locus analyses, BEAST can link or unlink substitutions models across the loci by clicking buttons on the top of {\bf Partitions} panel. The default of *BEAST is unlinking all models: substitution model, clock model, and tree models. Note that you should only unlink the tree model across data partitions that are actually genetically unlinked. For example, in most organisms all the mitochondrial genes are effectively linked due to a lack of recombination and they should be set up to use the same tree model in a *BEAST analysis. 

\subsubsection*{Import trait(s) from a mapping file to fire *BEAST}

%%JH
Each taxon in a \mlstname{} analysis is associated with a species. Typically the
species name is already embedded inside the taxon. The species name should be
easy to extract; place it either at the beginning or the end, separated by a
``special'' character which does not appear in names. For example,
\texttt{aria\_334259, coast\_343436} (using an underscore) or
\texttt{10x017b.wrussia, 2x305b.eastis} (using a dot).
%%JH

We need to tell BEAUti somehow which lineages in the alignments go with taxa in the species tree. Select the Taxon Set panel, and a list of taxa from the alignments is shown together with a default guess by BEAUti. In this case, the guess is not very good, so we want to change this. You can manually change each of the entries in the table, or press the guess button and a dialog is shown where you can choose from several ways to try to detect the taxon from the name of the lineages, or have a mapping stored in a file. In this case, splitting the name on the underscore character ('\_') and selecting the second group will give us the mapping that we need.

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/BEAUti_Guess_Taxonsets}

\caption{\label{fig.taxonset} Selecting taxon sets in BEAUti using the guess dialog from the taxon set panel.}
\end{figure}


Alternatively, the mapping can be read from a trait file.
A proper trait file is tab delimited. The first row is always \texttt{traits} followed by the keyword \texttt{species} in the second column and separated by tab. The rest of the rows map each individual taxon name to a species name: the taxon name in the first column and species name in the second column separated by tab. For example:

\begin{verbatim}
traits	species
taxon1	speciesA
taxon2	speciesA
taxon3	speciesB
... ...
\end{verbatim}



\subsubsection*{Setting the substitution model}

The next thing to do is to click on the {\bf Site Model} tab at the top of the
main window. This will reveal the evolutionary model settings for
BEAST. Exactly which options appear depend on whether the data are
nucleotides, or amino acids, or binary data, or general data.
The settings that will appear after loading the data set will
be the default values so we need to make some changes. 

Most of the models should be familiar to you. For this analysis, we
will select each substitution model listed on the 
left side in turn to make the following change: select HKY for substitution model
and \textbf{Empirical} for the \textbf{Frequencies}. 
\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/BEAUti_Site_Model}

\caption{\label{fig.sitemodel} Setting up substitution and site models for the gopher alignments.}
\end{figure}


\subsubsection*{Setting the clock model}

Second, click on the {\bf Clock Models} tab at the top of the
main window. In this analysis, we use the \textbf{Strict Clock} molecular clock model as default.
Your model options should now look like this: 

\begin{figure}
\centering
\includeimage[scale=0.3,clip=true,trim=0 450 0 0]{figures/BEAUti_ClockModel1}

\centering
\includeimage[scale=0.3,clip=true,trim=0 450 0 0]{figures/BEAUti_ClockModel2}

\centering
\includeimage[scale=0.3,clip=true,trim=0 450 0 0]{figures/BEAUti_ClockModel3}

\caption{\label{fig.clockmodel} Setting up clock models for the gopher alignments.}
\end{figure}


The \textbf{Estimate} check box is unchecked for the first clock model and checked for the rest clock models, because we wish to estimate the substitution rate of each subsequent locus relative to the first locus whose rate is fixed to 1.0. 

\subsubsection*{Multi Species Coalescent}

The {\bf Multi Species Coalescent} panel allows settings to the multi species coalescent model to be specified for each tree. *BEAST has a different tree prior panel where users can only configure the species tree prior not gene tree priors (which are automatically specified by the multispecies coalescent). 
Currently, three population size models are available: \textbf{Piecewise linear and constant root}, \textbf{Piecewise linear}, and \textbf{Piecewise constant}. In this analysis, we use \textbf{piecewise linear and constant root}.

The \textbf{Ploidy} item determines the type of sequence (mitochondrial, nuclear, X, Y). This matters since different modes of inheritance gives rise to different effective population sizes. In this analysis, we simply use a random starting tree. 

\begin{figure}
\centering
\includeimage[scale=0.3,clip=true,trim=0 150 0 0]{figures/BEAUti_MSP}

\caption{\label{fig.MSP} Setting up multi species coalescent parameters.}
\end{figure}


\subsubsection*{Priors and Operators}

The {\bf Priors} panel allows priors to be specified for each parameter in the model. 
Several models are available as tree priors, however, only the Yule or the Birth-Death Model are suitable for *BEAST analyses. Leave the tree prior setting at the default \textbf{Yule model}. The {\bf Operators} panel (hidden) is used to configure technical settings that affect the efficiency of the MCMC program. We leave these two panels unchanged in this analysis.

\subsubsection*{Setting the MCMC options }

The next tab, {\bf MCMC}, provides more general
settings to control the length of the MCMC and the file names. 

Firstly we have the \textbf{Length of chain}. This is the number of
steps the MCMC will make in the chain before finishing. The appropriate length of the chain depends on the size of the data set, the complexity of the
model and the accuracy of the answer required. The default value of 10,000,000
is entirely arbitrary and should be adjusted according to the size
of your data set. For this data set let's set the \texttt{chain
length} to \underline{\chainLength{}} as this will run reasonably quickly on most modern
computers (a few minutes).

The next options specify how often the parameter values in the Markov
chain should be displayed on the screen and recorded in the log file.
The screen output is simply for monitoring the programs progress so
can be set to any value (although if set too small, the sheer quantity
of information being displayed on the screen will actually slow the
program down). For the log file, the value should be set relative
to the total length of the chain. Sampling too often will result in
very large files with little extra benefit in terms of the precision
of the analysis. Sample too infrequently and the log file will not
contain much information about the distributions of the parameters. 
You probably want to aim to store no more than 10,000 samples so this should be
set to no less than chain length / 10,000.

For this exercise we will set the \texttt{screen log} to \underline{\screenEvery{}} and the \texttt{trace log} to \underline{\logEvery{}}. 
Set the file name of the trace log to \texttt{gopher.log} and that of the speciesTreeLogger to \texttt{gopher.species.trees}.

%If you would like to save the operator analysis into a file, you need to check  \textbf{Create operator analysis file} which will generate a file with the suffix \texttt{.ops}. 

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/BEAUti_MCMC}

\caption{\label{fig.MCMC} Setting up the MCMC paremeters.}
\end{figure}


If you are using windows then we suggest you add the suffix \texttt{.txt} to both of these (so,
\texttt{gopher.log.txt} and \texttt{gopher.species.trees.txt}) so that Windows recognizes
these as text files. 

\subsubsection*{Generating the BEAST XML file }

We are now ready to create the BEAST XML file. To do this, either select the {\bf File/Save} or {\bf File/Save As} option from the \textbf{File} menu. Check the default priors setting and click \textbf{Continue}. Save the file with an appropriate name (we usually end the filename with \texttt{.xml}, i.e., \texttt{gopher.xml}). We are now ready to run the file through BEAST. 

\subsection*{Running BEAST }

Now run BEAST and when it asks for an input file, provide your newly
created XML file as input by click \textbf{Choose File ...}, and then click \textbf{Run}. 

\begin{figure}
\centering
\includeimage[width=0.8\textwidth]{figures/BEAST}

\caption{\label{fig.BEAST} Launching BEAST.}
\end{figure}


BEAST will then run until it has finished
reporting information to the screen. The actual results files are
saved to the disk in the same location as your input file. The output to the screen will
look something like this: 

{\tiny   
\begin{verbatim}
                      BEAST v2.3.1, 2002-2015
             Bayesian Evolutionary Analysis Sampling Trees
                       Designed and developed by
 Remco Bouckaert, Alexei J. Drummond, Andrew Rambaut & Marc A. Suchard
                                    
                     Department of Computer Science
                         University of Auckland
                        remco@cs.auckland.ac.nz
                        alexei@cs.auckland.ac.nz
                                    
                   Institute of Evolutionary Biology
                        University of Edinburgh
                           a.rambaut@ed.ac.uk
                                    
                    David Geffen School of Medicine
                 University of California, Los Angeles
                           msuchard@ucla.edu
                                    
                      Downloads, Help & Resources:
                           http://beast2.org/
                                    
  Source code distributed under the GNU Lesser General Public License:
                   http://github.com/CompEvol/beast2
                                    
                           BEAST developers:
   Alex Alekseyenko, Trevor Bedford, Erik Bloomquist, Joseph Heled, 
 Sebastian Hoehna, Denise Kuehnert, Philippe Lemey, Wai Lok Sibon Li, 
Gerton Lunter, Sidney Markowitz, Vladimir Minin, Michael Defoin Platel, 
                 Oliver Pybus, Chieh-Hsi Wu, Walter Xie
                                    
                               Thanks to:
          Roald Forsberg\includeimage, Beth Shapiro and Korbinian Strimmer


... ...


        4980000     -3827.2093       279.2        -4293.2486        22.1146 24s/Msamples
        4990000     -3842.9689       276.3        -4295.6213        17.6877 24s/Msamples
        5000000     -3798.1962       278.2        -4280.7865        22.0904 24s/Msamples

Operator                                           Tuning    #accept    #reject      Pr(m)  Pr(acc|m)
NodeReheight(Reheight.t:Species)                        -     205317     486572     0.1386     0.2967 
ScaleOperator(popSizeBottomScaler.t:Species)       0.1902      10224      26338     0.0074     0.2796 
ScaleOperator(popMeanScale.t:Species)              0.4890       5795      16242     0.0044     0.2630 
UpDownOperator(updown.all.Species)                 0.5029      38885     108717     0.0295     0.2634 
ScaleOperator(YuleBirthRateScaler.t:Species)       0.2416       7155      15125     0.0044     0.3211 
ScaleOperator(treeScaler.t:26)                     0.7580       8510      42962     0.0102     0.1653 
ScaleOperator(treeRootScaler.t:26)                 0.4424      11487      39392     0.0102     0.2258 
Uniform(UniformOperator.t:26)                           -     287517     225590     0.1024     0.5603 
SubtreeSlide(SubtreeSlide.t:26)                    0.5560        418     256419     0.0512     0.0016 Try decreasing size to about 0.278
Exchange(narrow.t:26)                                   -     115438     140923     0.0512     0.4503 
Exchange(wide.t:26)                                     -       1363      49988     0.0102     0.0265 
WilsonBalding(WilsonBalding.t:26)                       -       1882      49890     0.0102     0.0364 
ScaleOperator(StrictClockRateScaler.c:29)          0.4744      13355      37595     0.0102     0.2621 
ScaleOperator(treeScaler.t:29)                     0.7567       8137      42785     0.0102     0.1598 
ScaleOperator(treeRootScaler.t:29)                 0.4101      12191      38887     0.0102     0.2387 
Uniform(UniformOperator.t:29)                           -     296262     216280     0.1024     0.5780 
SubtreeSlide(SubtreeSlide.t:29)                    0.5724        339     255443     0.0512     0.0013 Try decreasing size to about 0.286
Exchange(narrow.t:29)                                   -     117912     138109     0.0512     0.4606 
Exchange(wide.t:29)                                     -       2229      49303     0.0102     0.0433 
WilsonBalding(WilsonBalding.t:29)                       -       2459      49001     0.0102     0.0478 
UpDownOperator(updown.29)                          0.7793      10314      40694     0.0102     0.2022 
ScaleOperator(StrictClockRateScaler.c:47)          0.5490      12974      38097     0.0102     0.2540 
ScaleOperator(treeScaler.t:47)                     0.7363       8746      42523     0.0102     0.1706 
ScaleOperator(treeRootScaler.t:47)                 0.4742       7774      43670     0.0102     0.1511 
Uniform(UniformOperator.t:47)                           -     276942     234361     0.1024     0.5416 
SubtreeSlide(SubtreeSlide.t:47)                    0.5044        307     255119     0.0512     0.0012 Try decreasing size to about 0.252
Exchange(narrow.t:47)                                   -      91846     164747     0.0512     0.3579 
Exchange(wide.t:47)                                     -        731      50281     0.0102     0.0143 
WilsonBalding(WilsonBalding.t:47)                       -        937      50377     0.0102     0.0183 
UpDownOperator(updown.47)                          0.7397      10853      40537     0.0102     0.2112 
UpDownOperator(strictClockUpDownOperator.c:47)     0.7278       9987      41373     0.0102     0.1945 
UpDownOperator(strictClockUpDownOperator.c:29)     0.7761      10040      40724     0.0102     0.1978 
ScaleOperator(KappaScaler.s:26)                    0.3401        469       1197     0.0003     0.2815 
ScaleOperator(KappaScaler.s:29)                    0.2955        436       1231     0.0003     0.2615 
ScaleOperator(KappaScaler.s:47)                    0.3199        379       1298     0.0003     0.2260 
ScaleOperator(popSizeTopScaler.t:Species)          0.1910      23665      54936     0.0157     0.3011 

     Tuning: The value of the operator's tuning parameter, or '-' if the operator can't be optimized.
    #accept: The total number of times a proposal by this operator has been accepted.
    #reject: The total number of times a proposal by this operator has been rejected.
      Pr(m): The probability this operator is chosen in a step of the MCMC (i.e. the normalized weight).
  Pr(acc|m): The acceptance probability (#accept as a fraction of the total proposals for this operator).


Total calculation time: 135.584 seconds
End likelihood: -3798.196207797654
\end{verbatim}}

\subsection*{Analyzing the results}

Run the program called {\bf Tracer} to analyze the output of BEAST. When the main
window has opened, choose {\bf Import Trace File...} from the {\bf File} menu and select the file that
BEAST has created called \texttt{gopher.log}.
You should now see a window like in Figure \ref{fig.tracer1}.

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/Tracer1}

\caption{\label{fig.tracer1} Tracer with the gopher data.}
\end{figure}


Remember that MCMC is a stochastic algorithm so the actual numbers will not be exactly the same.

On the left hand side is a list of the different quantities that BEAST has logged. There are traces for the posterior (this
is the log of the product of the tree likelihood and the prior probabilities), and the continuous parameters. Selecting a trace
on the left brings up analyses for this trace on the right hand side depending on tab that is selected. When first opened, the
`posterior' trace is selected and various statistics of this trace are shown under the Estimates tab.
In the top right of the window is a table of calculated statistics for the selected trace. 

Tracer will plot a (marginal posterior) distribution for the selected parameter and also give you statistics such as the mean and median. The \texttt{95\% HPD lower} or \texttt {upper} stands for {\it highest posterior density interval} and represents the most compact interval on the selected parameter that contains 95\% of the posterior probability. It can be thought of as a Bayesian analog to a confidence interval. 

Select the \texttt{treeModel.rootHeight} parameter and the next three (hold shift whilst selecting). This will show a display of the
age of the root and the three gene trees. If you switch the tab at the top of the window to {\bf Marginal Density} then you will get a plot of the marginal posterior densities of each of these date estimates overlayed,
as shown in Figure \ref{fig.tracer2}.

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/Tracer2}

\caption{\label{fig.tracer2} Tracer showing the root heights of the lineage trees.}
\end{figure}


\subsection*{Obtaining an estimate of the phylogenetic tree}

BEAST also produces a sample of plausible trees. 
These can be summarized using the program {\bf TreeAnnotator}. This will take the set of trees and identify a single tree that best represents the posterior distribution. It will then annotate this selected tree topology with the mean ages of all the
nodes as well as the 95\% HPD interval of divergence times for each clade in the selected tree. It will also calculate the posterior clade probability for each
node. Run the {\bf TreeAnnotator} program and set it up to look like in Figure \ref{fig.TreeAnnotator}.

\begin{figure}
\centering
\includeimage[width=0.8\textwidth]{figures/TreeAnnotator}

\caption{\label{fig.TreeAnnotator} Using TreeAnnotator to summarise the tree set.}
\end{figure}


The burnin is the number of trees to remove from the start of the sample. Unlike {\bf Tracer} which specifies the number of steps as a burnin, in {\bf TreeAnnotator} you need to specify the actual number of trees. For this run, we use the default setting.

The {\bf Posterior probability limit} option specifies a limit such that if a node is found at less than this frequency in the sample of trees (i.e., has a posterior probability less than this limit), it will not be annotated. The default of 0.5 means that only nodes seen in the majority of trees will be annotated. Set this to zero to annotate all nodes.

For {\bf Target tree type} you can either choose a specific tree from a file or ask TreeAnnotator to find a tree in your sample. The default option, {\bf Maximum clade credibility tree}, finds the tree with the highest product of the posterior probability of all its nodes.

Choose {\bf Mean heights} for node heights. This sets the heights (ages) of each node in the tree to the mean height across the entire sample of trees for that clade.

For the input file, select the trees file that BEAST created (by default this will be called \texttt{gopher.species.trees}) and select a file for the output (here we called it \texttt{gopher.species.tree}).

Now press \texttt{Run} and wait for the program to finish.

\subsection*{Viewing the Species Tree}

Finally, we can look at the tree in another program called {\bf FigTree}. Run this program, and open
the \texttt{gopher.species.tree} file by using the Open command in the File menu. The tree should appear.
You can now try selecting some of the options in the control panel on the left. Try selecting
{\bf Node Bars} to get node age error bars. Also turn on {\bf Branch Labels} and select {\bf posterior} to get
it to display the posterior probability for each node. Under {\bf Appearance} you can also tell FigTree
to colour the branches by the rate.
You should end up with something like Figure \ref{fig.figtree}.

\begin{figure}
\centering
\includeimage[width=\textwidth]{figures/figtree}

\caption{\label{fig.figtree} Figtree representation of the species tree.}
\end{figure}


Alternatively, you can load the species tree set into DensiTree and set it up as follows.

\begin{itemize}
\item Set burn-in to 500. The tree should not be collapsed any more.
\item Show a root-canal tree to guide the eye. Perhaps, the intensity of the trees is not large enough, so you might want to increase the intensity by clicking the icon in the button bar.
\item Show clades, their mean and 95\% HPD graphically, and posterior support using text. Now, too many clades are shown, and most are not of interest. Select 'Selected only', then open the clade toolbar (menu Window/View clade toolbar), and select only highly supported clades. Also, select the clade consisting of heterodus, bottea, umbinus and townsendii, to show that heterodus is an outgroup,
but there is some support (over 16\%) that it is not.
\item Drag the clade monticola and idahoensis up so that the 95\% HPD bar does not overlap with the one for mazama, monticola, idahoensis and talpoidis. Increase font size of the label for better readability.
\end{itemize}

The image should look something like Figure \ref{fig.DensiTree}.

\begin{figure}
\centering
\includeimage[scale=0.4]{figures/DensiTree}
\caption{\label{fig.DensiTree} DensiTree representation of the species tree.}
\end{figure}

Exercise: There is about 75\% support for heterodus to be an outgroup, and about 17\% for heterodus to be in a clade with bottea, umbinus and townsendii. Can you explain where the other 8\% went?


DensiTree can be used to show the branch widths of summary 
tree from tree annotator as population sizes. 
%Under `Line Width' in DensiTree, choose `dmv1' for the bottom and `dmv2' for the top.
Under `Line Width' in DensiTree, choose `BY\_METADATA\_NUMBER' for the bottom and for the top,
and choose numbers 2 and 3 in the 'top' and 'bottom' spinner.
Left, the bottom represents dmv1, the top dmv2 in the summary tree, which do not quite match 
in areas with little posterior support for the clades (see Figure \ref{fig.DensiTree} to
see which clades have little support).


Right, the top is matched up with the bottom of the branch above, using the `Make fit to bottom'
option for the top. This looks a bit prettier, but may not be quite accurate.

\begin{figure}
\centering
\includeimage[scale=0.18]{figures/species_population1}
\includeimage[scale=0.18]{figures/species_population2}
\end{figure}

Alternatively, a consensus tree can be generated by biopy (\url{https://github.com/jheled/biopy})
with using 1-norm left, and 2-norm right.

\begin{figure}
\centering
\includeimage[scale=0.18]{figures/species_population4}
\includeimage[scale=0.18]{figures/species_population5}
\end{figure}

Showing all consensus trees with population widths
(use By Metadata Pattern, for top and bottom and use {\begin{verbatim} .*dmv=.([^,]*).*\end{verbatim}} 
for the bottom pattern and {\begin{verbatim} .*dmv=.[^,]*,([^\}]*).*\end{verbatim}} for the top pattern.
This gives us this visualisation:

\begin{figure}
\centering
\includeimage[scale=0.25]{figures/species_population3}
\end{figure}


\subsection*{Comparing your results to the prior}

Using BEAUti, set up the same analysis but under the MCMC options, select the {\bf Sample from prior only} option. This will allow you to visualize the full prior distribution in the absence of your sequence data. Summarize the trees from the full prior
distribution and compare the summary to the posterior summary tree.



\section*{Bonus: Species delimitation using STACEY}

Arguably, some of the gopher species in the dataset analysed so far belongs to a single species, judging from the posterior tree distribution in Figure \ref{fig.DensiTree}. To test this, we can run the same analysis as above with STACEY, which stands for Species Tree And Classification Estimation, Yarely. You need the following:
\begin{itemize}
\item STACEY package, easiest installed using the package manager from the BEAUti File/Manage packages menu.
\item {\tt SpeciesDelimitationAnalyser}, available by downloading\\ {\tt http://www.indriid.com/2014/speciesDA.jar}.
\end{itemize}

You can find more documentation on how to set up an analysis and optimise some settings in the STACEY package documentation, which is part of the STACEY package (in the directory $<$BEAST package directory$>$/2.3/STACEY/doc/ where $<$BEAST package directory$>$ depends on the OS you are using. To find it, click the button with the question mark in the package manager dialog in BEAUti).

To set up the analysis in BEAUti do the same as for the *BEAST analysis with the following changes:
\begin{itemize}
\item use the STACEY template instead of the StarBeast template.
\item do {\bf not} change the tree prior in the priors tab.
\item for the tutorial, set the chain length to 2 million for a coffee break of about 2 minutes, or higher to get more acceptable ESSs (and a longer coffee break).
\end{itemize}

Save the XML file and run it with BEAST.

Analyse the log file using 
\begin{verbatim}
java -jar /path/to/speciesDA.jar -burnin 10 species_123456.trees out.txt
\end{verbatim}
where the number 123455 should be equal to the seed used to run BEAST. This creates a file {\tt out.txt} which looks something like this:

\newpage
{\tiny
\begin{verbatim}
count      fraction               similarity             nclusters  talpoides idahoensis townsendii umbrinus monticola bottae heterodus mazama 
16         0.08376963350785341    165.85714285714275     5          1    4    5    5    4    5    2    3    
19         0.09947643979057591    151.21428571428558     5          4    1    5    5    2    5    3    4    
104        0.5445026178010471     139.92857142857142     6          1    2    6    6    3    6    4    5    
2          0.010471204188481676   21.53571428571428      5          1    2    5    5    4    5    3    4    
11         0.05759162303664921    17.5                   5          4    1    5    5    4    5    2    3    
1          0.005235602094240838   15.85714285714286      6          5    1    2    6    5    6    3    4    
2          0.010471204188481676   15.000000000000002     4          3    3    4    4    3    4    1    2    
6          0.031413612565445025   14.57142857142857      7          1    2    3    7    4    7    5    6    
10         0.05235602094240838    10.0                   4          2    3    4    4    3    4    1    2    
2          0.010471204188481676   8.5                    5          1    4    5    5    2    5    3    4    
1          0.005235602094240838   7.607142857142858      8          1    2    3    4    5    6    7    8    
2          0.010471204188481676   4.7857142857142865     4          3    3    4    4    1    4    2    3    
3          0.015706806282722512   3.9642857142857144     5          4    4    5    5    1    5    2    3    
2          0.010471204188481676   3.857142857142857      6          1    5    2    6    5    6    3    4    
2          0.010471204188481676   2.928571428571429      7          1    2    7    3    4    7    5    6    
2          0.010471204188481676   2.0                    6          5    1    2    6    3    6    4    5    
2          0.010471204188481676   2.0                    4          3    1    4    4    3    4    2    3    
2          0.010471204188481676   2.0                    4          1    3    4    4    3    4    2    3    
1          0.005235602094240838   1.0                    7          1    2    7    7    3    4    5    6    
1          0.005235602094240838   1.0                    4          2    2    4    4    3    4    1    3 
\end{verbatim}
}

Note that the third line (with count=104) suggests more than half the time the taxa townsendi, umbrinus and bottea form a single species.


\bibliographystyle{plain}
\bibliography{StarBEAST_tutorial}

\end{document}

